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Abstract 

We consider the length L of the longest common subsequence of two randomly uniformly 
and independently chosen n character words over a fc-ary alphabet. Subadditivity arguments 
yield that E [L] /n converges to a constant 7^. We prove a conjecture of Sankoff and Mainville 
from the early 80 's claiming that ^kVk 2 as k ^ 00. 



1 Introduction 

Consider two sequences of length n, with letters from a size k alphabet S, say fi and i'. The 
longest common subsequence (LCS) problem is that of finding the largest value L for which there 
are 1 < ii < i2 < ■ ■ ■ < ii 1^ n and I < ji < j2 < ■ ■ ■ < Jl 1^ n such that f^i^ = Uj^, for all 
t = 1,2, 

The LCS problem has emerged more or less independently in several remarkably disparate 
areas, including the comparison of versions of computer programs, cryptographic snooping, and 
molecular biology. The biological motivation of the problem is that long molecules such as proteins 
and nucleic acids like DNA can be schematically represented as sequences from a finite alphabet. 
Taking an evolutionary point of view, it is natural to compare two DNA sequences by finding their 
closest common ancestors. If one assumes that these molecules evolve only through the process of 
inserting new symbols in the representing strings, then ancestors are substrings of the string that 
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Dept. Ing. Matematica, U. Chile. 
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represent the molecule. Thus, the length of the longest common subsequence of two strings is a 
reasonable measure of how close both strings are. In the mid 1970's, Chvatal and Sankoff [Sj proved 
that the expected length of the LCS of two random k-aiy sequences of length n when normalized by 
n converges to a constant. The value of this constant 7^ is unknown although much effort has been 
spent in finding good upper an lower bounds for it (see, for example, [S] and references therein). 
The best known upper and lower bounds for 7^ do not have a closed form. There are obtained 
either as numeric approximation to the solutions of a nonlinear equation or as a numeric evaluation 
of some series expansion (see [Sj for a survey of such results). 

Although the problem of determining 7^ has a simple statement, it has turned out to be a 
challenging mathematical endeavor. Moreover, its quite naturally motivated. Indeed, a claim that 
two DNA sequences of length n are far apart makes sense provided their LCS differs significantly 
from 74n (since DNA sequence have 4 basis elements). 

We analyze the behavior of 7^ for k tending to infinity, and more generally, we consider the 
expected length of the LCS when k is an (arbitrarily slowly growing) function of n and n ^ 00. 
The focus on the case where k grows with n is partly inspired by the work of Kiwi and Loebl |13j . 
For a bipartite graph G over two size n totally ordered color classes A and B, they considered 

L{G) = max{L : 3ai < . . . < a/,, 5i < . . . < 6^, Uibi G E{G), 1 < i < L} , 

and studied its behavior when G is uniformly chosen among all possible d-regular bipartite graphs 
on A and B. They established that Ln{G)/y/dn — > 2 as n ^ 00 provided d = o(n^/^). Under this 
latter condition, any node of the d-regular bipartite graph can potentially be matched to a d/n — > 
fraction of the other color class nodes. In the case of interest here, that is the LCS problem with 
k 00, it also happens that any sequences' character can be matched to an expected l//c — > 
fraction of the other sequence's characters. Both for this work and in JSj) the vanishing fraction 
of (expected) potential matches is a key issue. 

In this paper we confirm a conjecture of Sankoff and Mainville from the early 80's [ijj stating 
that 

lim 7fc\/fc = 2 . (1) 

fc— >oo 

(See jl6. § 6.8] for a discussion of work on lower and upper bounds on 7^ as well as a stronger 
version, due to Arratia and Steele, of the above stated conjecture.) 

The constant 2 in (pQ) arises from a connection with another celebrated problem known as the 
longest increasing sequence (LIS) problem. The problem is also referred to as "Ulam's problem." 
(e.g., in |12|IH[TK]). Some (e.g., ^Hl) incorrectly credit Ulam for raising it in |2n| where he mentions 
(without reference) a "well-known theorem" asserting that given + 1 integers in any order, it is 
always possible to find among them a monotone subsequence of n + 1. The theorem is due to Erdos 
and Szekeres 7* . The discussion in j20, concerns only the behavior of the monotonic subsequence 
of a randomly and uniformly chosen permutation of + 1 elements. Monte Carlo simulations are 
reported in j2], where it is observed that over the range n < 100, the limit of the LIS of + 1 
randomly chosen elements, when normalized by n, approaches 2. Hammersley [S| gave a rigorous 
proof of the existence of the limit and conjectured it was equal to 2. Later, Logan and Shepp (14) . 
based on a result by Schensted ^H]) proved that 7 > 2; finally, Vershik and Kerov |U obtained that 
7 < 2. In a major recent breakthrough due to Baik, Deift, Johansson |lj the asymptotic distribution 
of the longest increasing sequence random variable has been determined. For a detailed account of 
these results, history and related work see the surveys of Aldous and Diaconis |lj and Stanley jlOj . 
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It has been speculated that the behavior of the longest strictly /weakly increasing subsequence 
of a uniform random word of length n, with letters from S may have "connections with the subject 
of sequence comparison statistics, motivated by DNA sequence matching ..." pj. Our work re- 
enforces this speculation and in fact does more. It partly elicits the nature of the connection and 
the conditions under which sequence matching statistics relate to the behavior of longest increasing 
sequences. 



2 Statement of Results 

Let A and B henceforth denote two disjoint totally ordered sets. We assume that the elements of 
A are numbered 1, 2, ... , \ A\ and those of V are numbered 1, 2, ... , \B\. We denote by r and s the 
size of \A\ and \B\, respectively. Typically, we have r = s = n. 

Now, let G be a bipartite graph with color classes A and B. Two distinct edges ab and a'b' of 
G are said to be noncrossing if a and a' are in the same order as b and b'; in other words, if a < a' 
and b < b' or a' < a and b' < b. A matching of G is called planar if every distinct pair of its edges 
is noncrossing. We let L{G) denote the number of edges of a maximum size planar matching in G 
(note that L{G) depends on the graph G and on the ordering of its color classes). 

We will focus on the following two models of random graphs: 

- The random words model T,{Kn^n]k): the distribution over the set of subgraphs of -fC„,n 
obtained by uniformly and independently assigning each node of Kn^n one of k characters 
and keeping those edges whose endpoints are associated to equal characters. Note that only 
disjoint unions of complete bipartite graphs may appear in this model. 

- The binomial random graph model G{Kn,n',p)- the distribution over the set of subgraphs of 
Kn^n where each edge of Kn^n is included with probability p, and these events are mutually 
independent. (This is an obvious modification of the usual G{n,p) model for bipartite graphs 
with ordered color classes.) 

In order to keep the presentation simple, we first formulate and prove the results for the random 
words model. Then, in Section [7| we state analogous results for the binomial random graph model. 
These results' proofs are almost identical to the case of the random words model, and we only 
briefly comment on them. 

Our results essentially say that L(S(i^„.„; A;)) • \pkln converges to 2 as A; ^ cx), provided that 
n is sufficiently large in terms of k. 

Theorem 1 For every e > there exist ko and C such that for all k > ko and all n with n/\/k > C 
we have 

(l_e)._ < E[L{mn,n;k))] < + 

Moreover, there is an exponentially small tail bound; namely, for every e > there exists c > 
such that for k and n as above, 

2n 



P 



2n 

L{^Kn,n;k))--j= 



Vk. 



< e" 



^/Vk 



Corollary 2 The limit jk = hiiin^oo E [L(S(i^'„^„; /i:))/n] exists, and 

lim 7fc\/^ = 2. 

fc— >oo 
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3 Tools 



The crucial ingredient in our proofs is a sufficiently precise result on the distribution of the length of 
the longest increasing subsequence in a random permutation. We state a remarkable strong result 
of Baik, Deift and Johansson eqn. (1.7) and (1.8)] (our formulation slightly weaker than theirs, 
in order to make the statement simpler). A much weaker tail bound than provided by them would 
actually suffice for our proof. 

Theorem 3 Let LlSjv be the random variable corresponding to the length of the longest increasing 
subsequence of a randomly chosen permutation of {1, ... , N}. There are positive constants Bq,Bi, 
and c such that for every A with Bq/N^^^ < A < \/iV — 2, 

P [lIStv > 2y/N + A^iv] < Bi exp (-cA^/^TV^/^) , 

and for every A with Bq/N^^^ < A < 2, 

pJlIStv < 2\/iV - XVn^ < Bi exp (-cA^Tv) . 

We will also need a suitable version of Talagrand's inequality; see, e.g., |1U1 Theorem 2.29]. 

Theorem 4 (Talagrand's inequality) Suppose that Zi, . . . , Zf^f are independent random vari- 
ables taking their values in some set A. Let X = f{Zi, Zn), where / : A^ ^ R is a function 
such that the following two conditions hold for some number c and a function ip: 

(L) If z,z' G A^ differ only in the kth coordinate, then \ f{z) — f{z')\ < c. 

(W) If z £ and r G R with f{z) > r, then there exists a witness {ujj : j G J), J C {1, . . . , N}, 
\J\ < ijj{r)/(? , such that for all y G A^ with y^ = uji when i ^ J , we have f{y) > r. 

Let m be a median of X. Then, for all t > 0, 

P[X > m + t] < 2e-*'/^'^('"+*). 

and 

V[X<m-t\< 2e-*'/^'^('^). 
We will also need the following version of Chebyshev's inequality: 

Lemma 5 Let Xi, . . . ,Xn be random variables attaining values and 1, and let X = X^i^i ^i- 
Let A = Y.ijLj E [XiXj]. Then, for all t > 0, 

P[|X-E[X]| >t]<l(E [X](1-E[X]) + a). 



t2 

Proof: Since Y>[\X -^[X]\>t\ < Var[X] /t^ and 

Var[X] = ^(E[X,X,]-E[X,]E[X,]) 

= EE[^f] -EE[^dE[X,]+^E[X,X,] , 

the desired conclusion follows by additivity of expectation and the fact that since Xi is an indicator 
variable, X} = Xi. ■ 
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4 Small graphs 



In this section we derive a result essentially saying that Theorem ^ holds if k is sufficiently large 
in terms of n. For technical reasons, we also need to consider bipartite graphs with color classes of 
unequal sizes. 

Proposition 6 For every S > 0, there exists a (large) positive constant C such that: 

(i) // rs > Ck and (r + s)\/rs < Sk'^/'^ /Q, then with = mu{r, s) = 2(1 + 6)y/rs/k, we have 

P[L{^{Kr,s;k)) >mu+t]< 2e-*'/s(™"+*) 

for all t > 0. 

(ii) If rs > Ck and and r + s < 5k/6, then with rriu as above and mi = mi{r, s) = 2(1 — 6)y/rs/k, 
we have 

P[L{^{Kr,s; k)) <mi-t]< 2e-*'/^™" 

for all t > 0. 

Let G be a random bipartite graph generated according to the random words model ^^{Kr^s', k). 
The idea of the proof is simple: we show that (ignoring degree nodes) G is "almost" a matching, 
and the size of the largest planar matching in a random matching corresponds precisely to the 
length of the longest increasing sequence in a random permutation of the appropriate size. 

We have to deal with the (usually few) vertices of degree larger than one. To this end, we 
define a graph G' obtained from G by removing all edges incident to nodes of degree at least 2. 
Throughout, E and E' denote E{G) and E{G'), respectively. 

We clearly have E [\E\] = rs/k. We will need a tail bound for large deviation from the expec- 
tation; a simple second-moment argument (Chebyshev's inequality) suffices. 

Lemma 7 For every rj > 0, 

1 

ri^[rs/k) 

Proof: For e S E{Kr^s) let Xe be the indicator of the event e £ E. Furthermore, let X = 
\E\ = X^ees^e- The X^s are indicator random variables with expectation 1/k. Moreover, since 
E [XeXf] = 1/A;2 for e / /, we have Ee^/ E [XeXf] = rs{rs - l)/k^ = (E [X])^ - E [X] /k. Thus, 
Lemma 13 yields 

P[|X-E[Xl|>,E[X|l<-^(l-i). 
The desired conclusion follows immediately. ■ 

Now we bound above the expectation of \E \ E'\. 
Lemma 8 

E[\E\E'\]<{r + sf-^. 









rs' 




\E\-'4 
k 


> rj ■ 


T 
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Proof: Let Y^^ equal the degree of w if it is at least 2 and otherwise. Define Y = J2w(^v(G) ^w- 
Note that \E\E'\ < Y (equality does not necessarily hold since both endpoints of an edge might 
be incident on nodes of degree at least 2). Let Pd be the probability that a vertex in color class A 
has exactly d incident edges. For any node a in color class A, 



E [yj = ±^dPd = E [deg^(a)] " = J " ^ (l " ^)'"' < {^)' 



(using (1 — x)^ > 1 — hx). Similarly E [Yh] < (r/k)"^ for all nodes b in color class B, and so 



rs 



B[\E\E'\]<E[Y]<{r + s)^ 



Proof of Proposition |6j Changing one of the characters associated to a vertex of a bipartite 
graph G changes the value of L{G) by at most 1. Hence L{G) is 1-Lipschitz. Furthermore, the 
characters associated to 2u) nodes of G suffice to certify the existence of to noncrossing edges (and 
thus L{G) > oo). So Talagrand's inequality applies and, with m denoting a median of L{G), yields 

P[L(G) >m + t]< 2e-*'/8(m+t) ^nd P[L(G) <m-t\< 2e-*'/s™. 

The proposition will follow once we show that mi < m < rriu- To prove that m < rriu, it suffices to 
verify that 

P[L{G)>mu]<\. (2) 

Let ?7 > be a suitable real parameter which we will specify later. We observe that since \E'\ < \E\ 
and L{G)-L{G') <\E\E'\, 



P[L(G)>m„] < P 

+ P 
+ P 



T S 

E\ > + 



\E\E'\>5/-^ 



L{G')>{2 + 6)^j,\E'\<{l + ^)- 



We bound the terms one by one. By Lemma |H1 and Markov's inequality, 



r + s frs 1 
- 5k \T - 6 



Taking iV = (1 + ri)rs/k and A = [(2 + 6)/^/TTr]] - 2 > in Theorem 01 we get that 



(3) 



L{G')>{2 + 6)^j,\E'\<{l + r^)j 
< Bi exp < Bi exp - 



1/5N 



From Lemma [71 and @, it follows that 

P[L(G) > mj < ^ +I + B1 exp 



1/5N 



(4) 
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So, ((21) follows by taking, say, r] = \/Q/C and using rs > Ck. 

To establish that mi < m, we proceed as before, i.e., we show that 



F[L{G) < mi] < 



1 



(5) 



Indeed, observe that since \E'\ = \E\ - \E\E'\ and L{G') < L{G), 



F[L{G) <mi] < P 



\E\ < (1 -r/) 



rs 



+ P 
+ P 



\E\E'\ > 5 



rs 



L{G')<2{l-6)J-,\E'\>{l-rj-6)- 



We again bound the terms one by one, applying as done above Lemma |H1 Markov's inequality 
and Theoremini respectively. Indeed, for a suitable real value rj > and A = 2 — [2(1 — — 2r]\ > 
we get 



P[L(G) < mi] < 



1 



ri'^{rs/k) 6 



+ - + Bi exp ( — cA 



,rs 



So, Q follows by taking again rj = \/6/C and using rs > Ck. Proposition El is proved. 



5 The lower bound in Theorem [T] 



In this section we establish the lower bound on the expectation of L{T,{Kn,n', k)) and the lower tail 
bound for its distribution. 

Given e, let 6 > be such that (1 - 25f = l-e, and let G = C{5) be as in Proposition El Fix 
C > -v/C* large enough so that 

Let h{k) = n= [5k/12\. Proposition El applies for k > ko where ko is such that n^ko) > G^/k^. It 
follows that 



2f> 

E A;)] > (1 - 2,5) — • P 

2n 



n 



L{G)>2{l-26)-^ 



n 



> (l-2<5)-^ l-2exp -— ^ 

~ ^ Vk\ I 4(1 + 5) 



2n 
Vk' 



The desired lower bound on the expectation follows since by subadditivity, (1/n) • E [L(5](i^^„^„; k)] 
is nondecreasing. 

Now we establish the lower tail bound. Let n = [Cv^] and q = \n/n\. Moreover, let 
G be chosen according to $](i^„^„; k) and let Gi be the subgraph induced in G by the vertices 
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(i— 1) ■ n + 1, . . . ,i ■ h in each color class, i = 1, . . . ,q. We observe that L{Gi), . . . , L{Gq) are 
independent identically distributed with distribution Ti[Kn^n] k) and L{G) > L{Gi) + • • • + L{Gq). 
Let // = E [L{Gi)] and t = e(2n/^/A?). Since n < (g + l)n, the lower bound on fi proved above yields 
that 

2n' 



L{G) < (1 



< P 



Y^HGi) <q^i-t+{fi-t) 



An argument similar to the one used above to derive the bound ^ > (1 — e)2n/\/fc can be used 
to obtain < (1 + e)1nl\fk from Propostion |H1 Let n be large enough so that n > n(l + 2e)/£. 
Thus, q > {!+£)/£ and t > eq^/{l +e) > /i. Hence, a standard Chernoff bound |in[ Theorem 2.1] 
implies that 



L{G) < (1 



, 2n 



< P 



j2HG^)<qfi-t 



2 1+e) 



6 The upper bound in Theorem [T] 

We will only discuss the tail bound since L(S(X„^„; k)) < n always, and so the claimed estimate 
for the expectation follows from the tail bound. 

Let e > be fixed. We choose a sufficiently small 5 = 5(e) > 0, much smaller than e. Require- 
ments on 6 will be apparent from the subsequent proof. 

Henceforth, we fix constants 1/2 < a < /3 < 3/4 (any choice of a and (i in the specified range 
would suffice for our purposes). In this section, we will always assume that k > ko for a sufficiently 
large integer kQ = A;o(e), and that n is sufficiently large compared to A:: n > k^, say. Note that for 
n < k^ (and k sufficiently large), the tail bound of Theorem ^ follows from Proposition IHl 

Block partitions. Let us write 



rur. 



^ 2n 



for the upper bound on the expected size of a planar matching as in Theorem ^ We also define an 
auxiliary parameter 

i = k'' . 

This is a somewhat arbitrary choice (but given by a simple formula). The essential requirements 
on £ are that i be much larger than \/k and much smaller than k^^^. We note that n/(. is large by 
our assumption n> k^ . 

Let M be a planar matching with mmax edges on the sets A and B, \A\ = \B\ = n. We define 
a partition of M into blocks of consecutive edges. There will be roughly n/£ blocks, each of them 
containing at most 

1 i_ 

_5 n 

edges of M. So Cmax is of order £/\^, which by our assumptions can be assumed to be larger 
than any prescribed constant. Moreover, we require that no block is "spread" over more than i 
consecutive nodes in A or in i?. 

Formally, the ith. block of the partition will be specified by nodes aj,a^ G A and bi,b^ £ B; 
Uibi £ M is the first edge in the block and a[b[ S M is the last edge (the block may contain only 
one edge, and so aibi = a[b[ is possible). The edge aibi is the first edge of M, and aj+ibj+i is the 
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edge of M immediately following a^b[. Finally, given aibi, the edge is taken as the rightmost 
edge of M such that 

• the ith block has at most emax edges of M, and 

• a[ — ai < £ and b'^ — bi < £ (here and in the sequel, with a little abuse of notation, we regard 
the nodes in A and those in B as natural numbers 1,2, ... ,n, although of course, the nodes 
in A are distinct from those of B). 

Let q denote the number of blocks obtained in this way. It is easily seen that q = 0{n/£). 
A block partition is schematically illustrated in Fig. ^ 



B 



Ol 



62 



b'o 



b'. 



02 



64 




03 



O4 



O4 



Figure 1: A block partition. 

Counting the types. Let be the number of edges of M in the ith block. Let us call the 
5(jr-tuple T = (01,0'^, bi, b'i,ei, . . . , Og, Og, bq, b'^, Cq) the type of the block partition of M, and let us 
write T = T{M). Let T denote the set of all possible types of block partitions of planar matchings 
as above. 



Lemma 9 We have 



with a suitable absolute constant Ci. 



\T\ < exp Ci-log^ 



n 



Proof: The number of choices for oi, . . . , Og is at most the number of ways of choosing q elements 
out of n, i.e., ("). Since mmax ^ n, the number of choices for the Cj is no larger than the number 



of partitions of n into q positive summands, which is 



Grossly overestimating, for a fixed q 



we can thus bound the number of types by . Using the standard estimate < (enjq)'^ and 
q = 0{n/t), we get log |T| = 0{{n/£) \og£) as claimed. ■ 



The probability of a matching with a given type of block partition. Next we show that 
for every fixed type T, the probability that our random graph contains a planar matching of size 
rrimax with that type of block partition is very small. 
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Lemma 10 Let n and k be as above. For any given type T £ T, the probability px that the random 
graph Ti{Kn^n] k) contains a planar matching M with rrimax edges and with T[M) = T satisfies 



PT < exp l^-ce 6 ■ 
with a suitable absolute constant c > 0. 

Proof: Let Gi denote the subgraph of the considered random graph S(i^„^„; k) induced by the 
nodes Cj, + 1, . . . , and 6j, 6^ + 1, . . . , 6^. We note that the distribution of Gi is the same as that 
of T,{Kr^^Si', k), where r j = — Oj + 1 and Sj = b[ — bi + 1. 

A necessary condition for the existence of a planar matching M with T(M) = T is L{Gi) > e, 
for all i = 1,2, ... ,q. Crucially for the proof, the events L{Gi) > are independent for distinct i, 
and so we have 

PT<flP[L{mn,s-,k))>ei]. 

4 = 1 

The plan is to apply Proposition El^i) for each i. The construction of the block partition guarantees 
that Tj, Si < i, and so the condition (r^ + Si)^/riSi < 6K^^'^ /6 in Proposition El is satisfied. However, 
the condition rjSj > Gk may fail. To remedy this, we artificially enlarge the blocks; clearly, this 
can only increase the probability that a planar matching of size Ci is present. 

Let us call the ith block short if it is the last block, i.e., i = q, or if ej = Cmax- Let S C [q] denote 
the set of all indices of short blocks. We have (IS*! — l)emax < "T-maxi and since Cmax > | • ^ • ""T-max — 1 > 
we obtain \S\ < 25n/i. 

The blocks that are not short are called regular, and we write i? = [g] \ S. For a regular block 
i, we have max(aj+i — aj, — bi) > £ by the construction of the block partition. 

Now we define the sizes of the artificially enlarged graphs, which will replace the Gi in the 
subsequent calculation. Namely, for a short block {i £ S), we set 

^2 — — £• 

For a regular block {i £ R), we distinguish two cases. If Oj+i — Oi > i, we set fi = i and 
Si = max{5£, Sj). Otherwise, we set fj = max{5£, ri) and Si = £. 

In the first case above, we have fj < a^+i — aj and Sj — Sj < 6£, and similarly for the second case. 
Therefore, J2ieR^i < n + 5£ ■ \R\ = {1 + 0{5))n, with an absolute constant in the O(-) notation, 
and similarly J2i£R Si = {1 + 0{5))n. For i G S we find J2i£S ^i^J2i£S ^ \S\ ■ £ < 25n. Altogether 

j2n<{l + 0{6))n, ^s. < (l + 0(5))n. (6) 

i=l i=l 

Now fi and Si already satisfy the requirements of Proposition IHl^i), since we have fjSj > 6£'^ = 
5k'^°' > Gk and {fi + Si)y/r~Ti < 2£'^ = 2k^°' < 6k^^^/6. We thus have, by Proposition 



P[L{^iKfi,s^;k)) > Ci] < 2e 



-(ei-m„(ri,Si))2/8ei 



for all i such that > mu{ri,Si), where mu{r,s) = (1 + 6)2y/rs/k. In the denominator of the 
exponent, we estimate Cj < Cmax- We thus have 



PT <l[2e 



q 

max{0,ei-m,u(ri,Si))2/ 



i=l 
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(note that the factors for i with < mu{ri,Si) equal 1). We consider the logarithm olpT, we use 
the Cauchy-Schwarz inequality, and the inequality max(0, x) + max(0, y) > max(0, x + y): 

-InpT > ^max{0, a - mu{ri,Si))'^ - q In 2 

> — -•[ ^max(0,ei - m„(ri,Si)) I -gin 2 



^max 



q 



> VL{1) \^ei -^mu{fi,Si)\ -gin 2 

Cmax n \.^^ .^^ J 

^ 2(1 + ,^) ^ V 



The function i-^ y/xy is subadditive: ^Jxy + yjx'y' < \/{x + x'){y + y'). Thus, using ©, we 

have 



i=l 



and so, since q = 0{n/l) and / > ^/k, 



Inpr > Q f ^ V(l + - (1 + 0{5))^y - gln2 = 17 (sH • ^\ 



Lemma EH is proved. 



Proof of Theorem ^ We have 



P[L(i;(A"„_„; k)) > m-max] < ^ PT < \T\ ■ maxpT ■ 



The sought after estimate 

P[L(S(K„,„; k)) > m^ax] < exp (^-n{eHn/Vk) 
follows from Lemmas 1^ and [TUl ■ 



7 Extensions 

Similarly one can prove results for the Erdos model analogous to those obtained in previous sections 
(essentially, k is now replaced by 1/p): 

Theorem 11 For every e > there exist constants po £ (0, 1) and C such that for all p < po and 
all n with riy/p > C we have 

(1 - e) • 2n • < E [L{GiKn,n;p))] < (l + e) ■ 2n ■ ^ . 

Moreover, there is an exponentially small tail bound; namely, for every e > there exists c > 
such that for p and n as above, 

P[|L(G(i^„,„;p)) - 2n^\ > e2n^ < e'^^v^ . 
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Subadditivity arguments yield that Ei[L{G{Kn,n',p))] /n converges to a constant Ap as n ^ cxd. 
The previous theorem thus impHes that Ap/^Jp — > 2 as p — > 0. 

Also, similar results hold for the G{Kr^s',p) model as those derived for T,(Kr^s',k). Specifically, 

Proposition 12 For every 5 > 0, there exists a (large) positive constant C such that: 

(i) If rs > C/p and {r + s)^/rs < S/Qp^/"^, then with m„ = m,^{r^s) = 2(1 + 5)^rsp, we have 

V[L{G{Kr,s\p)) >niu + t\< 2e-*V8(™u+t) 

for all t > 0. 

(ii) If rs > C/p and and r + s < 6/6p, then with as above and mi = mi{r, s) = 2(1 — 5)yjrsp, 
we have 

nL{G{Kr,s;p)) <mi-t\< 2e-*'/«'"" 

for all t > 0. 

In jllj . Johansson implicitly considers a model somewhat related to the G{Kn,n',p) model. 
Specifically, a distribution G*{Kny,p) over weighted instances of Kn^n- The weight of each edge 
is a geometrically distributed random variable taking the value /c G N with probability (1 — p)'^p, 
and the edge weights are mutually independent. Denoting the maximum weight planar matching 
of an instance drawn according to G* {Kn,n',p) by L{G* {Kn,n',p))i Johansson's result Theorem 
1.1] says that for all pi G (0, 1), 



hm - • E [LiG*{Kn,n;p))] - ^ "^^^ 



n-^oo n p 

Note that an instance G of G{Kn,n',p) can be obtained from one drawn according to G* {Kn,n]p) 
by including in G only those edges of i^n,n with nonzero weight. Hence, 

E[L(G(iC„,„;p))] < E[L(G*(i^„,„;p))] . 

It follows that Ap < {1 + ^/T^f /p, for all p £ (0,1). We shall see below that known results 
imply a much stronger bound on Ap for not too large values of p. 

Gravner, Tracy and Widom |Hj consider processes associated to random (0, l)-matrices where 
each entry takes the value 1 with probability p, independent of the values of other matrix entries. 
In particular they study a process called oriented digital boiling (ODB) and analyze the behavior 
of a so called height function which equals, in distribution, the longest sequence {ii,ji) of positions 
in a random (0, l)-matrix of size nx n which have entry 1 such that the z/'s are increasing and the 
ji's are nondecreasing. In contrast, L{G{Kn,n',p)) equals in distribution the longest such sequence 
with both ii^s and ji^s increasing. This latter model is referred to as strict oriented digital boiling 
in [S], but no results are claimed for it. Clearly, an ODB process dominates that of a strict ODB 
process. Hence, |H1 §3, (1)] implies that for any p < 1/2, 

Ap < Kp := lim i •E[L(G(K„,„;p))] = 2Jp{l-p), 

n — >oo fi V 

which in turn implies that limsupp^Q Ap/^ < 2. Nevertheless, our derivation of this latter limit 
value is elementary in comparison with the highly technical nature of jS]. 
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